Goto

Collaborating Authors

 news publisher


Meta cuts deals with several news publishers for AI use

Engadget

It'll hoover up their data and provide real-time access to current events. Meta has cut several deals with news publishers to help provide real-time data for its AI chatbot services, . The commercial agreements will allow its Meta AI chatbots to better answer user queries about news and current events. These are multiyear deals where publishers will be compensated for the use of their content, but we don't have any monetary specifics. The contracts do stipulate that Meta's chatbots will link out to articles when answering news queries, potentially offering a slight traffic boost to publishers.


The Company Quietly Funneling Paywalled Articles to AI Developers

The Atlantic - Technology

"You shouldn't have put your content on the internet if you didn't want it to be on the internet," Common Crawl's executive director says. Listen to more stories on the Noa app. T he Common Crawl Foundation is little known outside of Silicon Valley. For more than a decade, the nonprofit has been scraping billions of webpages to build a massive archive of the internet. This database--large enough to be measured in petabytes--is made freely available for research.


The End of Publishing as We Know It

The Atlantic - Technology

When tech companies first rolled out generative-AI products, some critics immediately feared a media collapse. Every bit of writing, imagery, and video became suspect. But for news publishers and journalists, another calamity was on the horizon. Chatbots have proved adept at keeping users locked into conversations. They do so by answering every question, often through summarizing articles from news publishers.


Envisioning Stakeholder-Action Pairs to Mitigate Negative Impacts of AI: A Participatory Approach to Inform Policy Making

Barnett, Julia, Kieslich, Kimon, Helberger, Natali, Diakopoulos, Nicholas

arXiv.org Artificial Intelligence

The potential for negative impacts of AI has rapidly become more pervasive around the world, and this has intensified a need for responsible AI governance. While many regulatory bodies endorse risk-based approaches and a multitude of risk mitigation practices are proposed by companies and academic scholars, these approaches are commonly expert-centered and thus lack the inclusion of a significant group of stakeholders. Ensuring that AI policies align with democratic expectations requires methods that prioritize the voices and needs of those impacted. In this work we develop a participative and forward-looking approach to inform policy-makers and academics that grounds the needs of lay stakeholders at the forefront and enriches the development of risk mitigation strategies. Our approach (1) maps potential mitigation and prevention strategies of negative AI impacts that assign responsibility to various stakeholders, (2) explores the importance and prioritization thereof in the eyes of laypeople, and (3) presents these insights in policy fact sheets, i.e., a digestible format for informing policy processes. We emphasize that this approach is not targeted towards replacing policy-makers; rather our aim is to present an informative method that enriches mitigation strategies and enables a more participatory approach to policy development.


Talk to Right Specialists: Routing and Planning in Multi-agent System for Question Answering

Wu, Feijie, Li, Zitao, Wei, Fei, Li, Yaliang, Ding, Bolin, Gao, Jing

arXiv.org Artificial Intelligence

Leveraging large language models (LLMs), an agent can utilize retrieval-augmented generation (RAG) techniques to integrate external knowledge and increase the reliability of its responses. Current RAG-based agents integrate single, domain-specific knowledge sources, limiting their ability and leading to hallucinated or inaccurate responses when addressing cross-domain queries. Integrating multiple knowledge bases into a unified RAG-based agent raises significant challenges, including increased retrieval overhead and data sovereignty when sensitive data is involved. In this work, we propose RopMura, a novel multi-agent system that addresses these limitations by incorporating highly efficient routing and planning mechanisms. RopMura features two key components: a router that intelligently selects the most relevant agents based on knowledge boundaries and a planner that decomposes complex multi-hop queries into manageable steps, allowing for coordinating cross-domain responses. Experimental results demonstrate that RopMura effectively handles both single-hop and multi-hop queries, with the routing mechanism enabling precise answers for single-hop queries and the combined routing and planning mechanisms achieving accurate, multi-step resolutions for complex queries.


The Hottest Startups in Dublin in 2024

WIRED

Thanks to low corporation tax and government incentives, Dublin has hosted the European Headquarters of many large US technology companies--Google, Meta, LinkedIn and Microsoft all have offices in the city's Silicon Docks. "The big US companies operated independently of the startup world for many years," explains Will Prendergast, partner at Frontline Ventures. "But in the last five years, US technology companies have been building product and engineering functions here, and that talent is starting to spill out, driving startup creation." Government support via Enterprise Ireland's Pre-Seed Start Fund, designed to accelerate early stage startups, and hubs such as Dogpatch Labs are supporting this wave of new talent. "Ireland does have a capital issue," says employee benefits startup Kota co-founder Luke Mackey.


MiRAGeNews: Multimodal Realistic AI-Generated News Detection

Huang, Runsheng, Dugan, Liam, Yang, Yue, Callison-Burch, Chris

arXiv.org Artificial Intelligence

The proliferation of inflammatory or misleading "fake" news content has become increasingly common in recent years. Simultaneously, it has become easier than ever to use AI tools to generate photorealistic images depicting any scene imaginable. Combining these two -- AI-generated fake news content -- is particularly potent and dangerous. To combat the spread of AI-generated fake news, we propose the MiRAGeNews Dataset, a dataset of 12,500 high-quality real and AI-generated image-caption pairs from state-of-the-art generators. We find that our dataset poses a significant challenge to humans (60% F-1) and state-of-the-art multi-modal LLMs (< 24% F-1). Using our dataset we train a multi-modal detector (MiRAGe) that improves by +5.1% F-1 over state-of-the-art baselines on image-caption pairs from out-of-domain image generators and news publishers. We release our code and data to aid future work on detecting AI-generated content.


AI companies are finally being forced to cough up for training data

MIT Technology Review

AI companies have pillaged the internet for training data, and many websites and data set owners have started restricting the ability to scrape their websites. We've also seen a backlash against the AI sector's practice of indiscriminately scraping online data, in the form of users opting out of making their data available for training and lawsuits from artists, writers, and the New York Times, claiming that AI companies have taken their intellectual property without consent or compensation. My colleague James O'Donnell dissects the lawsuits in his story and points out that these lawsuits could determine the future of AI music. But this moment also sets an interesting precedent for all of generative AI development. Thanks to the scarcity of high-quality data and the immense pressure and demand to build even bigger and better models, we're in a rare moment where data owners actually have some leverage.


Google fined 250m in France for breaching intellectual property rules

The Guardian

Google has been fined 250m ( 213m) by French regulators for breaching an agreement over paying media companies for reproducing their content online. France's competition watchdog said on Wednesday that it was fining the US tech company for breaches linked to intellectual property rules related to news media publishers. The regulator also cited concerns about Google's AI service. The competition authority said Google's AI-powered chatbot Bard – since rebranded as Gemini – was trained on content from publishers and news agencies without notifying them. The watchdog said in a statement that the fine was for "failing to respect commitments made in 2022" and accused Google of not negotiating in "good faith" with news publishers on how much to compensate them for use of their content.


Why is it OK for rich guys to steal my work?

Los Angeles Times

Every day, what's left of the once-mighty ranks of reporters across this country tap out stories meant to inform, entertain and expose. Sometimes they are the work of minutes, the first bits of knowledge on breaking news such as fires, storms or even elections. Sometimes they are investigations that have taken years. Inevitably, as soon as we publish, rich dudes with algorithms come in and sweep this work away for their own profit, like deodorant off a Target shelf. Retail theft is causing a civic meltdown and inspiring a ballot measure to incarcerate repeat toothpaste thieves.